Generating Natural Language from Linked Data: Unsupervised template extraction

نویسندگان

  • Daniel Duma
  • Ewan Klein
چکیده

We propose an architecture for generating natural language from Linked Data that automatically learns sentence templates and statistical document planning from parallel RDF datasets and text. We have built a proof-of-concept system (LOD-DEF) trained on un-annotated text from the Simple English Wikipedia and RDF triples from DBpedia, focusing exclusively on factual, non-temporal information. The goal of the system is to generate short descriptions, equivalent to Wikipedia stubs, of entities found in Linked Datasets. We have evaluated the LOD-DEF system against a simple generate-from-triples baseline and human-generated output. In evaluation by humans, LOD-DEF significantly outperforms the baseline on two of three measures: non-redundancy and structure and coherence.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Parsing for Generating Surface-Based Relation Extraction Patterns

Finding the right features and patterns for identifying relations in natural language is one of the most pressing research questions for relation extraction. In this paper, we compare patterns based on supervised and unsupervised syntactic parsing and present a simple method for extracting surface patterns from a parsed training set. Results show that the use of surfacebased patterns not only i...

متن کامل

A Survey of Unsupervised Techniques for Web Data Extraction

World Wide Web contains a large amount of data and to fetch important information from web has become a useful task. There are many web information extraction systems are developed and categorised in manual, supervised, semisupervised and unsupervised techniques. We will study unsupervised techniques and how they differ from each other. Roadrunner uses match algorithm for generating the wrapper...

متن کامل

Trinity: Unsupervised Web Data Extraction Using Ternary Trees

ARTICLE INFO Internet presents a huge collection of useful information so extracting information from web document has become research area for which web data extractors are used. This technique works on two or more web documents generated by same sever side template and learns a regular expression that models it and then used it for extracting data from similar documents. The technique introdu...

متن کامل

Experiments in Linear Template Combination using Genetic Algorithms

Natural Language Generation systems typically have two parts ­ strategic (" what to say ") and tactical (" how to say "). We present our experiments in building an unsupervised corpus­driven template based tactical NLG system. We consider templates as a sequence of words containing gaps. Our idea is based on the observation that templates are grammatical locally (within their textual span). We ...

متن کامل

An Unsupervised Approach to Domain-Specific Term Extraction

Domain-specific terms provide vital semantic information for many natural language processing (NLP) tasks and applications, but remain a largely untapped resource in the field. In this paper, we propose an unsupervised method to extract domain-specific terms from the Reuters document collection using term frequency and inverse document frequency.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013